AITopics | lookahead policy

Real Time Dynamic Programming (RTDP) is an online algorithm based on Dynamic Programming (DP) that acts by 1-step greedy planning. Unlike DP, RTDP does not require access to the entire state space, i.e., it explicitly handles the exploration. This fact makes RTDP particularly appealing when the state space is large and it is not possible to update all states simultaneously. In this we devise a multi-step greedy RTDP algorithm, which we call $h$-RTDP, that replaces the 1-step greedy policy with a $h$-step lookahead policy. We analyze $h$-RTDP in its exact form and establish that increasing the lookahead horizon, $h$, results in an improved sample complexity, with the cost of additional computations. This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning. We then analyze the performance of $h$-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of $h$-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors.

lookahead policy, name change, online planning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

The Value of Reward Lookahead in Reinforcement Learning

Neural Information Processing SystemsOct-10-2025, 10:38:48 GMT

In reinforcement learning (RL), agents sequentially interact with changing environments while aiming to maximize the obtained rewards.

agent, information, lookahead, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

a18aa23ee676d7f5ffb34cf16df3e08c-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 12:39:58 GMT

algorithm, relation hold, value update, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)

Add feedback

a18aa23ee676d7f5ffb34cf16df3e08c-AuthorFeedback.pdf

Neural Information Processing SystemsAug-15-2025, 12:39:39 GMT

algorithm, online planning algorithm, reviewer, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.37)

Add feedback

Review for NeurIPS paper: Online Planning with Lookahead Policies

Neural Information Processing SystemsJan-27-2025, 04:23:40 GMT

Additional Feedback: COMMENTS AFTER REBUTTAL Thank you for your response. However, in this paper's case I find that the significance of the paper (i.e., support for your claim that "theoretical results provided in this work are important on their own") is severely lacking without experiments showing a link between this theory and an algorithm's performance in terms of measures like running time, number of 1-step Bellman backups, etc. ***Note: this is not a claim that every theoretical paper needs experiments; it applies only to this specific work, due to the theory issues mentioned in the original review.*** The rebuttal's attempted arguments against providing experiments really miss the mark: -- The rebuttal gives the "Beyond the one step greedy approach in RL" as an example of a paper similar in the degree of its theoretical focus to this submission, but that paper actually has experiments! Light experiments could do the job. That "Beyond the one step greedy approach in RL" paper that you mentioned yourself is a case in point.

experiment, lookahead policy, significance, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Online Planning with Lookahead Policies

Neural Information Processing SystemsOct-10-2024, 23:39:28 GMT

Real Time Dynamic Programming (RTDP) is an online algorithm based on Dynamic Programming (DP) that acts by 1-step greedy planning. Unlike DP, RTDP does not require access to the entire state space, i.e., it explicitly handles the exploration. This fact makes RTDP particularly appealing when the state space is large and it is not possible to update all states simultaneously. In this we devise a multi-step greedy RTDP algorithm, which we call h -RTDP, that replaces the 1-step greedy policy with a h -step lookahead policy. We analyze h -RTDP in its exact form and establish that increasing the lookahead horizon, h, results in an improved sample complexity, with the cost of additional computations.

algorithm, lookahead policy, online planning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Filters

Collaborating Authors

lookahead policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

The Value of Reward Lookahead in Reinforcement Learning

a18aa23ee676d7f5ffb34cf16df3e08c-Supplemental.pdf

a18aa23ee676d7f5ffb34cf16df3e08c-Paper.pdf

a18aa23ee676d7f5ffb34cf16df3e08c-AuthorFeedback.pdf

Online Planning with Lookahead Policies

The Value of Reward Lookahead in Reinforcement Learning

a18aa23ee676d7f5ffb34cf16df3e08c-Supplemental.pdf

a18aa23ee676d7f5ffb34cf16df3e08c-AuthorFeedback.pdf

Review for NeurIPS paper: Online Planning with Lookahead Policies

Online Planning with Lookahead Policies